Constructing a biodiversity terminological inventory
نویسندگان
چکیده
The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications.
منابع مشابه
Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) Method and Results
The aim of this paper is to provide an effective method for constructing a Corpus Based Terminological Knowledge Base –CTKB. The two stages of the method based on linguistic knowledge are outlined : the identification of candidateterms and the identification of conceptual relationships. This method is applied to a French corpus and the results are assessed from the point of view of various appl...
متن کاملInvestigation of the Effect of Constructing Small Arc Basins System on Vegetation Composition and Biodiversity in Aridland Ecosystems in the East of Iran (Case study: Rangelands of Sarbisheh, South Khorasan Province)
Introduction: One of the ways of restoration and reclamation of damaged rangeland is to use different methods of rain harvesting such as pitting, counter furrowing, flood spreading, small arc basins system and etc., along with the reduction of runoff, it increases the soil moisture content and thus increases vegetation cover. Biodiversity is most commonly used to describe the number of species....
متن کاملDefining a Gold Standard for the Evaluation of Term Extractors
We describe a methodology for constructing a gold standard for the automatic evaluation of term extractors, an important step toward establishing a much-needed evaluation protocol for term extraction systems. The gold standard proposed is a fully annotated corpus, constructed in accordance with a specific terminological setting (i.e. the compilation of a specialized dictionary of automotive mec...
متن کاملClimate Change and Forest Biodiversity in the Eastern United States: Insights from Inventory Data
Climate Change and Forest Biodiversity in the Eastern United States: Insights from Inventory Data
متن کاملJoining Inventory by Parataxonomists with DNA Barcoding of a Large Complex Tropical Conserved Wildland in Northwestern Costa Rica
BACKGROUND The many components of conservation through biodiversity development of a large complex tropical wildland, Area de Conservacion Guanacaste (ACG), thrive on knowing what is its biodiversity and natural history. For 32 years a growing team of Costa Rican parataxonomists has conducted biodiversity inventory of ACG caterpillars, their food plants, and their parasitoids. In 2003, DNA barc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 12 شماره
صفحات -
تاریخ انتشار 2017